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Abstract. Let C be a finite set of n elements and TZ = {Ri, R2, ■ ■ ■ , Rm} 
a family of m subsets of C. The family TZ verifies the consecutive ones 
property if there exists a permutation P of C such that each Ri in TZ 
is an interval of P. There already exist several algorithms to test this 
property in 0(^~\_, \Ri\) time, all being involved. We present a simpler 
algorithm, based on a new partitioning scheme. 

1 Introduction 

Let C — {ci, . . . , c„} be a finite set of n elements and TZ = {Ri, R2, ■ ■ ■ , R m } a 
family of m subsets of C. Those sets can be seen as a 0-1 matrix, where the C 
represents the columns and each Ri the ones of row i. Figure [l] shows such a 
matrix. 

Ci c 2 c 3 c 4 c 5 c 6 c-i Cg 

Rj 1 

R 2 1 

ff 3 1 1 

K 4 11101111 
R 5 1 1 1 
R 6 1 1 1 1 

R 7 1 

ff 8 1 

R 9 1 

#10 1 — 1 

Fig. 1. A matrix verifying the consecutive ones property, its associated PQ-tree 
and the information contained in overlap classes. In the PQ-tree, Q nodes are 
represented by boxes, while P nodes by circles. 

The family TZ verifies the consecutive ones property (C1P) if there exists a 
permutation P of C such that each Ri in TZ is an interval of P. For instance, the 
family given by the matrix of[T]verifies C1P. Efficiently testing C1P has received 
a lot of attention in the literature for this problem to be strongly related to 
the recognition of interval graphs, the recognition of planar graphs, modular 
decomposition and others graph decompositions. The consecutive ones property 
is the core or many other algorithms that have applications in a wide range 
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of domains, from VLSI circuit conception through planar embeddings |10j to 
computational biology for the reconstruction of a chromosome from a set of 
contigs [3]. We denote \TZ\ — Y^T=i l-^l- Several 0(|7£|) time algorithms have 
been proposed to test this property, following five main approaches. 

The first approach and still the most well known one is the use of PQ-tree 
structure pQ. A PQ-tree is a tree that represents a set of permutations defined 
by the possible orders of its leaves obtained by changing the order of the children 
of any internal node depending of its type which can be P or Q. For a P node, 
any order of its children is valid, while for a Q node only the complete reversal 
of its children is accepted. For instance, in Figure [l] the PQ-tree represents the 
order C4C2C6C1C3C7C5C8, but also C4C2C6C7C3C1C5C8, C4C6C2C7C3C1C8C5, and so on. 
The main point for using PQ-trees is that if a family verifies C1P, then one can 
build a PQ-tree representing exactly all column orders for which the C1P will be 
verified. For instance, the PQ-tree in Figure [l] represents all orders for which the 
family given by the matrix at its right verifies C1P. If a family does not verify 
C1P, its associated PQ-tree is said empty. 

Given a family, in order to build its associated PQ-tree, each row is inserted 
one after the other in the tree while the PQ-tree is not empty. This update is 
done through a procedure called Refine which complexity is amortized on the 
size of the tree. The main drawback of this approach is that the implementation 
of Refine in its linear time complexity is still a challenge. It uses a series of 11 
templates depending on the form of the tree and choosing which to use in con- 
stant time is a huge programming difficulty, that has only slightly been reduced 
by Young |llj using a recursive Refine that allows us to reduce the number of 
templates. Moreover, extracting a certificate that the family does really not ver- 
ify C1P from this approach is hard. Therefore, given a PQ-tree implementation, 
one can hardly be confident neither in its validity nor in its time complexity. 
This is the reason why many other algorithmic approaches have been tempted 
to test C1P using simpler and/or certified algorithms. 

One of those attempt consists in first transforming the C1P testing problem 
to interval graph recognition by adding fake rows and then use a special LexBFS 
traversal that produces a first order on C that has some special properties [5] . A 
recursive partitioning phase is then necessary following both this LexBFS order 
and an order on the rows derived from a clique tree built from the LexBFS 
traversal. This approach is also complex, both to understand and to program, 
and surprisingly the links between these two first approaches are not that clear. 

A third approach was to try to design the PC-tree [8], an easiest structure 
to refine than the PQ-tree. However as Haeupler and Tarjan noticed in [6], 
the authors of [5] did not consider "implementations issues" (sic) than lead to 
incorrect algorithms for C1P testing and planar graph recognition. 

A fourth approach appeared in [7] with the idea of simplifying the C1P test 
by avoiding PQ-tree. However, the algorithm remains very involved. 

A last and more recent approach has been presented by R. McConnell in [5] . 
This approach is a breakthrough in the understanding of the intrinsic constraints 
of C1P and the real nature of the PQ-tree. We describe this approach in details 



since our method is a tricky simplification of it. McConnell shows that each Q 
node of the PQ-trcc represents in fact an overlap class of a subset of the rows. 
Two rows Ri and Rj of K overlap if Ri n R } ■ ^ 0, R t \ Rj ^ 0, and Rj \R. t ^%. 
An overlap class is a equivalence class of the overlap relation, that is, two rows 
Ri and Rj are in the same class if there is a chain of overlaps from Ri to Rj. 
For instance, the two non trivial overlap classes of the family example given 
by the matrix of Figure [T] are shown on the same figure on the right. Overlap 
classes partition the set of rows and form a laminar family, and thus they can 
be organized in an inclusion tree. 

This tree is the skeleton of the PQ-tree and the remaining P node might also 
been derived from the overlap classes. However, for an equivalence class to be 
a node of the PQ-tree, it also has to verify the consecutive one property. Thus, 
where is the gain ? The trick used by McConnell is that verifying the C1P of 
an overlap class is independent of the other overlap classes and somehow easier 
provided a spanning tree of the overlap graph of the class. Using a partitioning 
approach guided by this tree, it is linear in the total size of the rows in an overlap 
class to test if this overlap class verifies C1P. Consequently, by testing overlap 
classes one after the other, one can verify if the whole set 1Z fulfills C1P in 0(|7£|) 
time. The technical complexity of the approach is twofold: (a) compute overlap 
classes and (b) a spanning tree of each class. 

Point (a) is performed in [5] through an algorithm of Dahlhaus published as 
a routine of [1] used for undirected graph split decomposition. It is considered 
by McConnell as a black box that takes as input 71 and returns a list of overlap 
classes and for each overlap class the list of rows that belongs to. 

Point (b) is then computed in [3] for each overlap class by a complex add-on 
from the list of rows in the class. 

In this article we present a simplification of this last approach by introducing 
a new partitioning scheme. It should be noted first that McConnell's approach 
can already be very slightly simplified using existing tools. Indeed, the algo- 
rithm of Dahlhaus for computing overlap classes is an algorithmic pearl that 
has been recently simplified and made computationable in the sense that the 
original version uses an LCA while the simplified version presented in [5] only 
uses partitioning. Moreover, a modification of Dahlhaus's approach allows us to 
extract a spanning tree of each overlap class. This modification is not obvious 
but remains simpler than the add-on of [5J. However, building a spanning tree 
from Dahlhaus is intrinsically difficult, because the two concepts are somehow 
antinomic: Dahlhaus's approach maintains some ambiguities in the row overlaps 
that permit to gain on the overall computation, while computing a spanning 
tree requires solving most of these ambiguities, which is sometimes difficult. In 
this paper, we successfully maintain these ambiguities even in the partitioning 
phase, avoiding buliding a spanning tree. 

To clearly present our approach let us consider the difference between the 
PQ-tree approach and that of McConnell in terms of partitioning. The PQ-tree 
records a partition of C induced by the rows even if some rows can be included in 
others (a row might not cut any class of the partition) . The difficulty arises when 



updating the structure: in the same time we need to update both a partition 
and an inclusion tree that are intrinsically merged. In the second approach the 
idea is to impose that each row added surely overlaps a previous one, which 
simplifies the partitioning since the inclusion tree as not to be maintained. This 
also insures the linear time complexity without any amortizing need, but at the 
cost of the computation of a spanning tree of each overlap class. 

Our approach lies in between. For each overlap call we update a partition, 
but we also allow some fail and swap in the partitioning scheme. We compute 
an order that guaranties that when adding a new row R\ , if it does not overlap 
any row already considered, then the row following i?2 will, and moreover i?i 
overlaps R 2 and will be considered next. We thus swap R\ and R 2 in the order 
we update the partition if Ri does not cut. We call this order a "swap overlap 
order" . This order could of course be obtained from a spanning tree, but we 
explain below how we can compute such an order at a very small computational 
price by entering deeper in Dahlhaus's algorithm, that we also slightly simplify 
for our needs. Our algorithm thus runs in 3 main steps: (1) the computation of 
each overlap class using an algorithm close to that of Dahlhaus, (2) for each class 
we compute of a swap overlap order, and (3) we partition each class guided by 
this order using a new partitioning scheme. If the partitioning fails on a class, 
the C1P is not verified. Steps 1 and 2 are performed in the same time, but for 
clarity we present them in two distinct steps. 

This article is organized as follows. In the following Section [2] we present two 
variations of Dahlhaus's algorithm for computing overlap classes. In Section [3] 
we explain our main notion of swap overlap order and explain how to slightly 
modify Dahlhaus's algorithm to generate such an order for each overlap class. 
In Section [4] we eventually explain how to test C1P on each overlap class using 
the swap overlap order associated to. We added two appendixes. The first is an 
example of the construction of a swap overlap order. The second is a technical 
routine used in Dahlhaus's algorithm revisited in [2] that we mainly recall. 



2 Computing Overlap Classes 

In this section we recall and slightly modify the algorithm of Dahlhaus for com- 
puting overlap classes already simplified and presented in [2] . The computational 
problem to efficiently compute the overlap classes comes from the fact that the 
underlying overlap graph, where Ri are the vertices and {Ri,Rj) is an edge if 
Ri overlaps Rj, might have 6>(|7?.| 2 ) edges and thus be quadratic in 0(\1Z\). An 
overlap class is a connected component of this graph. 

Let LR be the list of all R £ 1Z sorted in decreasing size order. The ordering 
of sets of equal size is arbitrarily fixed, and thus LR is a total order. Given 
R £ 1Z, we denote Max(i?) as the largest row X £ 1Z taken in LR order such 
that X <lr R and X overlaps R. This definition is modified from that in [2] to 
consider the order LR in the definition of Max(i?). 



Note that Max(i?) might be undefined for some sets of TZ. In this latter 
case, in order to simplify the presentation of some technical points, we write 
Max(i?) = 0. Dahlhaus's algorithm is based on the following observation: 

Lemma 1 Q4.2J). Let R e K such that Max(R) ^ 0. Then for all X eK such 
that X n R ^ and \R\ < \X\ < \Max(R)\, X overlaps R or Max(R) . 

The trick we propose below for computing the overlap order of each overlap 
class is also based on lemma [T] 

Let us assume first that we already computed all Max(-R). For each column 
c e C we compute the list SL(c) of all sets R G TZ to which c belongs. This 
list is sorted in increasing order of the sizes of the sets respecting LR, thus in 
decreasing order in LR. Computing and sorting all lists for all c £ C can be done 
in 0(|72.|) time using a stable bucket sort. 

Dahlhaus's overlap class identification is built on those lists. For all c € C, 
let R be a set containing c such that Max(i?) ^ 0. We define a new interval on 
SL(c) beginning in R, continuing from R in the order of SL{c) and finishing by 
the greatest row in SL(c) such that \Y\ < |Max(i?)|. Notice that this greatest 
row Y is not necessarily equal to Max(i?). If it is the case, the interval is said 
of type M (for Max included), of type E (for External) otherwise. Given an 
interval /, First (I) is the first row of the interval, thus the row which generates 
the interval. 

We "bucket" sort the intervals in a table TI[l..m] of m entries the following 
way. For an interval I = [Ri 1 . . . Ri k ], I is added to all TI[ij], 1 > j > k. 

An example of a family and the intervals associated to is shown in figure [2] 

TI 

1 El E2 

2 M3 M5 E4 E2 

3 M3 M5 M7 

4 M6 E4 

5 M6 M7 
6 

7 M7 E2 

8 M9 

9 M8 E2 

10 M9 

11 M8 El 

Fig. 2. Example: a family 1Z, its corresponding sets SL and the associated TI 
table. Intervals of type M are denoted by a plain line, while intervals of type E 
are denoted by a dash one. 
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To compute overlap classes, we mark them one after the other, keeping the 
numbering of the overlap class each row belongs to in a table NC[l..m] all ini- 
tialized to 0. 

Algorithm 1: computing all overlap classes 

1. Initialize the counter nc = 1 to count the overlap class we are tagging; 

2. Choose an arbitrary 1, 1 < I < m such that there exist at least on interval in 
TI[l]; 

3. For all interval(s) / = [R n . . . R ik }, in TI[l]. 

(a) remove all occurrences of / out of TI; 

(b) mark each row in / to belong to overlap class nc, thus NC[ij] — nc, 
l<j<k; 

(c) recurse this algorithm from step 3 on all ij, 1 < j < k, such that TI[ij] 
is not empty; 

(d) end the recursive procedure; 

4. Increment nc and apply step 2 while TI[l] is not empty. 

Rows that are not marked during this algorithm are themselves an overlap 
class of a single element that it is not necessary to consider further for testing 
C1P. We focus below on overlap classes that contain at least 2 rows. 

By lemma [TJ all rows in a given interval belong to the same overlap class. 
We prove now that Algorithm 1 computes all overlap classes. 

First, assume that 2 rows Rt and Rj are such that NC[Ri] — NC[Rj}. Then 
the two rows have been marked during a recursive call of Step 3 that recurse on 
each interval containing a row. Thus the whole process computes the closure of 
belonging to a same interval, which guaranties that the two rows are linked by 
a chain of overlap(s). 

Secondly, assume that two rows R\ and R2 overlap. Let us consider wlog that 
R2 <lr Ri- Then Max(-Ri) exists and as R\ and R2 intersect on at least one 
column c, R2 is in an interval beginning in R± on SL{c). We thus proved that: 

Proposition 1 ([4j)- Algorithm 1 computes all overlap classes ofTZ. 

Worst case complexity of Algorithm 1. Algorithm 1 can be implemented to run 
in 0(|7?.|), provided that for a given row R computing Max(i?) is 0(1) time (see 
Appendix |B| for details on this computation). 

Up to now we dispose of a general scheme for computing all overlap classes of 
1Z that is directly adapted from [4l2j . We now modify this approach to consider 
the two types M and E of intervals successively for each row, beginning with 
intervals of type M and then intervals of type E. 

Algorithm 2: the computation of all overlap classes revisited 

1. Initialize the counter nc — 1 to count the overlap class we are tagging; 

2. Choose an arbitrary 1,1 < I < m such that there exist at least on interval in 
TI[l] of type M; 



3. For all interval(s) I = [R n . . . R ik ] of type M in TI[l], 

(a) remove all occurrences of I out of TI; 

(b) mark each row in I to belong to overlap class nc, thus NC[ij] = nc, 
1 < 3 < k; 

(c) recurse this algorithm from step 3 on all ij, 1 < j < k, such that TI[ij] 
is not empty; 

4. For all interval(s) J = [R h . . . R ik ] of type E in TI[l], 

(a) remove all occurrences of J out of TI; 

(b) mark each row in J to belong to overlap class nc, thus NC[ij] — nc, 
1 < j < k; 

(c) recurse this algorithm from step 3 on all ij, 1 < j < k, such that TI[ij] 
is not empty; 

(d) end the recursive procedure; 

5. Increment nc and apply step 2 while TI[l] is not empty. 

Algorithm 2 is still valid since (a) it is a simple modification of Algorithm 1 
only considering two types of intervals and (2) in each overlap class there exist 
at least one interval of type M to begin with at step 2. 

3 Swap Overlap Order 

A swap overlap order is an order Ri 1 . . . Ri k on the rows of an overlap class such 
that, for all 2 < I < k, at least one of the two following cases is true: 

— overlaps one Ri g , 1 < g < I, 

— I < k and Ri l+1 overlaps Ri g , 1 < g < I, and R^ overlaps Ri l+1 - 

We now modify Algorithm 2 to output for each overlap class a swap overlap 
order. 

Algorithm 3: outputing a swap overlap order for all overlap classes 

1. Initialize the counter nc = 1 to count the overlap class we are tagging; 
Initialize O nc to the empty word e, 

2. Choose an arbitrary 1,1 < I < m such that there exist at least on interval in 
TI[l] of type M; 

3. For all interval(s) / = [R H . . . R ik ] of type M in TI[l], 

(a) remove all occurrences of / out of TI; 

(b) concatenate to O nc successively the rows Ri 1} Ri k , Ri 2 , .. ,Ri k _ 1 in this 
order, adding a row only if NC[ij] = 0. After adding a row, change 
NC[ij} to no. 

(c) recurse this algorithm from step 3 on all ij, 1 < j < k, such that TI[ij] 
is not empty; 

4. For all intcrval(s) J = [R n . . . R lk ] of type E in TI[l], 

(a) remove all occurrences of J out of TI; 

(b) recurse step 3 on TI[i{\; 



(c) concatenate to O nc successively the rows Ri 21 Ri 3 , .. ,Ri k in this order, 
adding a row only if NC[ij] = 0. After adding a row, change NC[ij] to 
no. 

(d) recurse this algorithm from step 3 on all ij, 1 < j < k, such that TI[ij] 
is not empty; 

(e) end the recursive procedure; 

5. Increment nc and apply step 2 while TI[l] is not empty. 

The main difference with Algorithm 2 in terms of recursive call is step 4.(b), 
where we first recurse on First (J) when considering an interval of type E before 
processing the interval itself. 

A trace of the execution of Algorithm 3 is given in Appendix [A] For the 
largest overlap class of our current example, it returns the swap overlap order 

What is the idea behind algorithm 3 ? We begin an order by considering 
and interval of type M, say / = [Ri 1 ■ ■ ■ Ri k ]- By placing R^ and then Ri k — 
Max(R il ) before all other rows in I, Lemma [I] guaranties that the following rows 
in / overlap either R il or R ik . 

Then, assume that there exits a row X between i?^ and Ri k in I. We re- 
curse on X. If the line corresponding to X in TI contains and interval, say 
I' = [R' h . . . R'i ,], it be of two types, M or E. 

Case 1. HI' is of type M then it will be process first before all type E intervals 
corresponding to X. Then, either X is the fist row of the interval, either not. 
Whatever, as X already appears in O nc by interval J, then by concatenating the 
rows in the order R' ii R' ik/ ... if not already in O nc , we guaranty that: 

— one of R[ = Max(i?' ;i ) or R' ix overlaps X that is already placed in O nc by 
Lemma [ij 

— each following row in if any, either overlaps R\^ or R! ii , or already ap- 
pears in O nc . 

Case 2. If /' is of type E, then R' it is not Max(i?^). Thus there is not guaranty 
that Max(i?^) (that has to exist since I' is an interval beginning in R' it ) has 
already been placed in O nc . Thus we first recurse on R' ti (step 4- (a)) to guaranty 
that after some recursion the rows R[ and Max(i?£ ) appear somewhere in O nc 
before processing /. Then, by lemma [lj each row following R' it in /' overlaps ei- 
ther Max(i?^) or R'^. As both are already in O nc , we simply concatenate them 
to O nc in step 4-(c). 

Thus, summarizing the 2 cases, when concatenating new rows to O nc , we can 
insure that either (a) we add a couple (X, Max(A)), provided that at least one 
of those rows overlaps a row Y already placed in O nc (note that if one of those 
rows is already in O nc , then the result also holds), or (b) a row X that surely 
overlaps a row already in O nc . Using this approach we identify each overlap class 



and in the same time we build a swap overlap order for each overlap class. 

Complexity. It is obvious that the time complexity is the same that Algorithm 
1 or Algorithm 2, that is, 0(\TZ\). 



4 Partitioning Each Overlap Class 

At this point, we built a swap overlap order for each non trivial overlap class. It 
remains to explain how to test C1P on each such class using this order. 

We use a partitioning that is relatively similar to that of [5], except that 
instead of being driven by a spanning tree it uses a swap overlap order that 
is easier to build since it is in the direct continuation of Dahlhaus's approach 
for computing overlap classes. However, the important difference is that using a 
swap overlap order we can not certify that we cut each time the current partition 
when refined by a new row. Instead, we can certify that if the new row i?i does 
not cut, the following row R 2 will, and i?i will then cut R 2 . We thus swap the 
two rows in the partitioning. 

Let us enter details. We maintain an ordered set of sets, called parts, of 
columns of C. When adding a row, a part C can only be cut in two parts C'C" 
such that C U C" = C and C n C" = 0. In the partitioning, C is replaced by 
C'C" or C'C depending the case, but the general order of the initial partition 
is maintained. 

To begin the partitioning phase, we consider the first row of the overlap 
order O nc — Ri 1 Ri 2 . . ■ Ri k of overlap class nc. We create a first part in our 
partition Pi that is composed of the columns of i?^ . We then refine this partition 
with i? 2 by first marking all elements of R 2 ■ Suppose first that R 2 overlaps (or 
cuts) Ri and let X = R X ^R 2 . We partition P by R 2 in P 2 = (R 1 \X)(X)(R 2 \X), 
thus wc simply placed all common elements of R\ and R 2 on a line in such a 
way that both i?i and R 2 are intervals of P, which is the core of the C1P. 

Let us now consider a new row Ri . . We mark elements of Ri in Pj—i. Suppose 
again that Ri j cuts a row already integrated to Pj-i - Let Y be the set of elements 
of Ri j that already appear in Pj . Two cases may occur: 

(a) if Y = we only try to group together the elements of R$ in P 2 . If we 
can, we only cut the parts accordingly to build Pj 

(b) if Y Ri j , then we try to cluster the elements of Y on a border (left or 
right) of Pj-i- If we can, we cut the parts accordingly and add a new part 
(Ri j \ Y) before (resp. after) all parts of Pj-i if the border was the left (resp. 
right) one to eventually build Pj. 

Example of partitioning on the first overlap class of our current data set with 
the order R 2 R 3 R4R 5 R 7 R 3 RiR 9 R 1 i. 
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Partition 



R2 
R3 
R4 

R 5 
R7 



{b, c,d} 

{c, d,e,f,g,h} 

{d,e} 

{e,f,g,h} 

{b,h} 



(bed) 
(b)(cd)(efgh) 
(b)(c)(d)(e)(fgh) 
(b)(c)(d)(e)(fgh) 



fail 



The main point of this approach is that if this process fails for a given row, 
the overlap call does not verify C1P. 

Proposition 2 (|9j). Let R^R^ . . . Ri k be a total order of the rows of a given 
overlap class nc such that each row Ri j ,j>2, overlaps a previous row R^ , 1 < 
I < j ■ Then the above partitioning fails if and only if the overlap class nc does 
not verify C1P. 

Proof. The intuition behind this theorem is that if two rows R a and Rb overlap, 
the intersection X = R a D Rb must rely in between and the only two possible 
column orders respecting C1P are (R a \ X) ( X) (R b \ X) or (R b \X)(X)(R a \X). 

Each part of the partition derives from the intersection of two rows or the 
difference of a row and its intersection with the other rows. Thus the order of the 
elements inside a part is not relevant and can be changed, but the global order 
of all parts is fixed and can not be changed (not considering a global reversal) 
without breaking the C1P of the previous rows. This has for consequence that 
when adding a new row that overlaps (at least) one row that is already embedded 
in the current permutation, C1P will be maintained only if the elements of the 
new rows can be embedded in P respecting the order of its parts. The fact that 
the order of the elements inside each parts is not relevant allows us to split some 
parts (placed in the extremities of the touched zone) in two subparts, those 
touched by the new row on a side, the rest on the other side. This is the only 
operation authorized when adding a row to test if we can maintain C1P adding 
the new row. 

A new row can be embedded in P under those conditions only in the two 
cases (a) and (b) equivocated above. Therefore, if the partitioning is feasible, 
then the new partition "encodes" all possible column order for the set of rows 
considered up to this point to verify C1P. If not, this insures that no column 
order could be valid for the set of rows to verify C1P. □ 

In our approach, as we manipulate swap overlap orders, the partitioning 
phase must be slightly modified in the following way. Suppose that we want to 
refine the partition Pj-i with iij. . If R^. does not overlap any previous row use 
in the partitioning, that is if all columns of Ri j either belong to the same part of 
Pj—i of to none, we swap . and Ri j+1 , refine the partition with Ri j+1 and only 
then with R4 . . The swap overlap order guaranties that Ri j+1 will cut a previous 
row, and that Ri j overlaps Ri j+1 - We call this partitioning a swap partitioning. 

Theorem 1. Let R^R^ . . . Ri k be a swap overlap order of the rows of a given 
overlap class nc. Then the above swap partitioning fails if and only if the overlap 
class nc does not verify C1P. 



Proof. By swapping the rows when necessary, we insure that the order of the 
Ri 1 Ri 2 . . . Ri k rows in which we refine the partition verifies that each row Ri j , j > 
2, overlaps a previous row R.^,1 < I < j, thus satisfying the conditions of 
proposition [2] □ 

Implementation issues. Let us now consider the time complexity of our par- 
titioning. We show below how it might be implemented in time 0(|O ra c|) where 
\O nc \ is the sum of the size of all rows belonging to the overlap class. 
The data structure we need must allow us to 

1. split a part C in C'C" in the number of the elements of C touched; 

2. add a new part to the left of to the right of the current partition in the 
number of the elements added; 

3. test if the elements touched can be made consecutive; 

4. test if a new row cut another one already embedded in the partition; 

There might be many data structures implementation having these proper- 
ties. We propose below a simple one. This structure can also replace that used 
in [2] for identifying all Max(X) used by Dahlhaus's algorithm (see Appendix 
[b| , and thus our whole algorithm only uses a single data structure. 

We basically use an array of size \C\ to store a stack which encodes a per- 
mutation of elements of C. Each cell of this array contains a column and a link 
to the part it belongs to. A part is coded as a pair of its beginning and end- 
ing positions in the array, relatively to the beginning of the array. A schematic 
representation of this data structure is given in Figure (3) 



Begin 



End 
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deb 
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Fig. 3. Example continued: implementation of (bed) and then (efgh)(dc)(b) 
when refining R 2 = {b, c, d} by R 3 = {c, d, e, /, g, h}. 



Using this data structure, refining a part C by one of its subset C" can be easily 
done in 0(\C"\). Indeed, let be the bounds of C. We swap elements in the 
subtable [i,j] to place all s = \C"\ elements of C" at the end or at the beginning 
of this subtable as necessary. Wc then adjust the bounds of C to — s] or 
[i + s, j] depending of the case and create a new set [j — s + 1, j] or [i, i + s — 1] 
on which the s elements of C" now point. 



Adding a new part to the left of to the right of the current partition in the 
number of the elements added is easy since it suffices to create a new part and 
move the pointers of the beginning or ending modulo \C\. An example of such 
operation is shown in Figure [3j 

Assume that a new row R used for refining cut a class in the partition P, 
and let Y C R be the elements of R that are already in the partition. 

If Y 7^ R, then, to verify C1P, all classes touched by Y must be placed at 
an extremity of P, all parts from this extremity must be fully touched except 
the last one of which all elements touched has to be placed on the side of the 
extremity we considered. All these requirements can easily be checked in the 
number of elements of R, and if they are verified, a new part containing R\Y 
is added to the extremity. 

If Y = R, then to verify C1P there should be a left part that might not be 
fully touched followed by a series (that can be empty) of plenty touched parts 
and eventually a last part also not necessary fully touched. This is also not 
difficult to check in 0(\R\). 

The novelty in our approach is that a new row R might not cut the current 
partition, which has to be tested efficiently. This can also easily be checked in 
0(|P|) on our structure. Indeed, it suffices to test if R is included in a single 
part, in none, or contains all parts. We thus have: 

Theorem 2. Testing the C1P of the rows belonging to a same overlap class can 
be done in 0(\O nc \) time provided a swap overlap order O nc of it. 

And eventually: 

Corollary 1. Testing the C1P of a family 1Z can be done in 0(\7Z\) using a 
swap overlap order of each overlap class. 

Proof. It suffices to compute all overlap classes of 1Z using Algorithm 3 that 
provides for each overlap class a swap overlap order. Then Theorem [2] insures 
that C1P can be tested on each overlap class in the number of rows belonging 
to this class. As overlap classes partition 1Z and that 1Z verifies C1P if an only 
if each overlap class verifies C1P, the whole test can be done in 0(|7£|) time. □ 
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A Trace of Algorithm 3 on Our Example 



We trace below the recursive steps of Algorithm 3 on our example for the identi- 
fication of the first overlap class while outputting the order Ox = R2R3R4R5R7 
-R3-R1-R9-R11. 



1. Step 1. O nc = s, nc = 1 

2. TI[2] is choosen in step 2 since it con- 
tains an interval of type M. 

3. Step 3i. We consider h = M3 = 
[R2R3} 

4. 3i-(a) all occurrences of M3 are re- 
moved out of TI 

5. 3i-(b) Oi = R2R3, NC[2] = 1, 
NC[3] = 1 

6. 3i-(c) Recursive call to Step S2 on R2 
from M3. We consider I2 = M5 = 
[R2R3} 

7. 32- (a) all occurrences of M5 are re- 
moved out of TI 

8. 3 2 -(b) as NC[2] = JVC[3] = 1 no row 
si added to Oi 

9. 32-(c) Recursive call to Step 33 on R 2 
from Mb. 

10. Entering Step 43 since there is no 
more interval of type M in T7[2]. We 
consider I3 = E4 = [R4R2] 

11. 43- (a) all occurrences of E4 are re- 
moved out of TI 

12. 43-(b) Recursive call to Step 34 on 
TJ[4]. We consider 7 4 = M6 = [R4R5] 

13. 34-(a) all occurrences of M6 are re- 
moved out of TI 

14. 3 4 -(b) Oi = R2R3R4R5, JVC [4] = 1, 
JVC[5] = 1 

15. 34-(c) Recursive call to Step 3s on 
TJ[4]. As TJ[4] is now empty, we re- 
turn to step 3 4 

16. 34-(c) Recursive call to Step 36 on 
TI[5]. We consider I 6 = M7 = 
[R7R5R3] 

17. 3e-(a) all occurrences of M7 are re- 
moved out of TI 

18. 3 6 -(b) Oi = R2R3R4R5R7, NC[7] = 
1 

19. 3e-(c) Recursive call to Step 37 on R7 
from Ml . We consider I7 = E2 = 
[R7R1R2R9] 

20. 47-(a) all occurrences of E2 are re- 
moved out of TI 



21. 4 7 -(b) Recursive call to Step 3s on 
TI[7]. As TI[7] is now empty, we re- 
turn to step 47 

22. 4 7 -(c) Oi = R2R3R4R5R7R1R9, 
NC[1] = 1, JVC [9] = 1 

23. 4 7 -(d) Recursive call to Step 3g on Ri 
We consider 1$ — El = [R11R1] 

24. 4g-(a) all occurrences of El are re- 
moved out of TI 

25. 4g-(b) Recursive call to Step 3io on 
R11 We consider 7i = M8 = [R11R9] 

26. 3io-(a) all occurrences of M8 are re- 
moved out of TI 

27. 3io-(b) d = R2R3R4R5R7R1R9R11, 

NC[n] = 1 

28. 3io-(c) Recursive call to Step 3n on 
T7[ll]. As T7[ll] is now empty, we 
return to step 3io 

29. 3io-(c) Recursive call to Step 3i2 on 
T7[9]. As T7[9] is now empty, we re- 
turn to step 3io than also ends, re- 
turning to Step 4g 

30. 4g-(c) Nothing to concatenate from 
El = [RuRi] since the two rows are 
already in O4. 

31. 4 9 -(d) Recursive call to Step 3i a on 
T7[l]. As T7[l] is now empty, we re- 
turn to step 49 which also ends, thus 
returning to Step 4 7 -(d) 

32. 47-(d) Recursive call to Step 3i4 on 
T7[2]. As T7[2] is now empty, we re- 
turn to step 47- (d) 

33. 4 7 -(d) Recursive call to Step 3i 5 on 
T7[9]. As T7[9] is now empty, we re- 
turn to step 47 which also ends, thus 
returning to Step 36-(c) 

34. 3e-(c) Recursive call to Step 3i6 on Ri 
from M7. As T7[l] is now empty, we 
return to step 36- (c) 

35. 3e-(c) Recursive call to Step 3i7 on R2 
from M7. As T7[2] is now empty, we 
return to step 36- (c) 

36. 36-(c) Recursive call to Step 3ig on Rg 
from M7. As T7[9] is now empty, we 



return to step 36-(c) which also ends, 
returning to Step 34- (c) 

37. 34- (c) Recursive call to Step 3ig on Rs 
from Ml. As TI[3] is now empty, we 
return to step 34-(c) which also ends, 
returning to Step 43- (b) 

38. 43-(c) Recursive call to Step 320 on R2 
from E4. As TI[2] is now empty, we 
return to step 34-(c) which also ends, 
returning to Step 32- (c). 



39. 32-(c) Recursive call to Step 32i on R3 
from M5. As TI[3] is now empty, we 
return to step 32-(c) which also ends, 
returning to Step 3i-(c). 

40. 3i-(c) Recursive call to Step 322 on 
R3 from M3. As TI[3] is now empty, 
we return to step 3i-(c) which also 
ends, ending the identification fo the 
first overlap class. The returning or- 
der for nc = 1 is thus Oi = 
R2R3R4R5R7R1R9R1L 



B Computing all Max(X) 



In this appendix we recall the computation of Max(i?) only slightly modified 
compared to the that published in [2]. The very small modifications is that we 
impose Max(i?) to be greater or equal to R in the LR order, while in [2] the 
constraint for Max(_R) is only to be of size greater or equal to that of R. This 
implies that in [3j and also in the original paper of Dahlhaus |1| Max(i?) can 
be after R in the LR order if |Max(i?)| = \R\, which in fact complexifies the 
understanding of the algorithm. 

We consider a boolean matrix BM of size \F\ x \C\ such that each row rep- 
resents a set R € T in the order of LR, and each column an element c £ C. The 
value BM[i, j] is 1 if and only if Cj 6 Rj . 

Let us consider first below that all columns of BM are lexicographically 
sorted. Figure [4] shows the BM matrix for the set family of Figure [2j 
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Fig. 4. Example continued: BM matrix which lines are sorted in LR order and 
which columns are sorted in lexicographic order. 



For each R £ T we denote left(i?) (resp. right(i?)) the number of the column 
of BM containing the leftmost (resp. rightmost) 1 in the row of R. 



Lemma 2 (|2j). Let i?i,i?2 S J~ such that R2 overlaps R\ in BM. Then there 
exists a row R < LR R 2 such that BM[R, Ze/t(i?i)] = and BM[R, right^)] = 1. 

Lemma 3 (|2j). Let Ri £ T . Then Max{R\) 7^ if and only if there exists 
a row R in BM such that BM[R, left(Ri)} = and BM[R, right(Ri)] = 1 and 
verifying \R\ > LR \Ri\. 

Lemma 4 Q2.4J). Let R 1 £ T such that Max(Ri) ^ 0. Then Max{R x ) corre- 
sponds to the highest row R in BM such that BM[R, left(R\j\ — and BM\R, 
rightiRx)} = 1. 

Dahlhaus's approach for computing all Max(i?i) the smallest R in LR order 
such that BM[i?,left(i)] = and BM[R, right (i?i)] = 1. Dahlhaus reduces the 
problem to LCA computations, which has been simplified in [5] using partitions. 

Computing all Max(fl) using set partitioning. We manipulate sorted par- 
titions of V that we refine by each R £ 1Z taken in LR order, that is, in decreasing 
order of their sizes. The initial partition is the whole set C and denoted Pq- The 
refinement is slightly restricted compared to that of Section [4] in the sense that 
C is always split in C'C" (and never C"C) if C" represents the set of elements 
in R. Refining a partition P by a set R £ 1Z consists in refining successively all 
parts in P. We note this refinement P\r. 

For example (continued), if P = {a}{i, j, k, l}{b}{c, d}{e, /, g, h} and R = 
i? 4 = {d,e}, P\ R = {a}{i,j,k,l}{b}{c}{d}{f,g,h}{e}. 

The approach requires 3 steps: 

1. refine Py by all R £ 1Z taken in LR order; 

2. then compute for each R £ TZ the values of left(i?) and right(-R) and sort all 
R £ 1Z in a special order in regard with these values; 

3. eventually refine Py again by all R £ TZ taken in LR order but using the 
informations computed in step 2 to compute all Max(i?). 

These 3 steps are detailed below. 

Step 1 - Refining Py. Let us consider the final partition we obtain after 
refining Py by each R £ TZ taken in LR order. We note this partition Pf. 

Lemma 5 ([2]). The elements of Pf are sorted accordingly to the lexicographical 
order of the columns of BM. 

For example (continued), on the data in Figure |ij Pf = {a}{i}{l}{j}{k}{b} 
{c}{d}{h}{f, g}{e}. Note that equal columns of BM are in the same part of Pf 
on which we fix an arbitrary order. 



Step 2 - Computing all left(R) and right(R) values. We then compute all 
left(i?) and right (i?) values on Pf. This can be done easily in 0(|7£| + n) time 
by scanning each R € 1Z and keeping the minimum and maximum position of 
one of its element in Pf. We also compute a data structure AM that for each 
position 1 < i < \V\ of Pf gives a list of all R £ 7Z such that i = right (R). All 
those lists are sorted in increasing order of left(i?). The structure also allows an 
element R S 1Z to be removed from the list AM[right(i?)] in O(l) time. This 
can be insured for instance using doubly linked list to implement each list, and 
the whole structure can easily be built in 0(n + m) time using bucket sorting. 

Step 3 - Refining Py again and identifying all Max(R). The main idea 
is the following. Assume that at a step of the refinement process in LR order we 
refine a part C — {ci 1 , . . . , Ci k } of a partition P by R 2 € 1Z and that it results 
two non empty parts C'C" . 

Lemma 6 Q2J). Let R G K such that \R\ < \Y 2 \, left(R) € C and right(R) e 
C". Then R 2 = Max(R). 

The last phase of the algorithm thus consists in refining Pq again by all R 2 £ 7Z 
taken in LR order. We first initialize all values Max(i?) to 0. Each time a new 
split C'C" appears (say between positions I and I + 1), for all c € C" all lists 
j4M[c] are inspected the following way: let R be the top of one of those the 
list; while left(i?) < I, R is popped off the list and Max(i?) 4— R 2 . After having 
refined with R 2 , R 2 is removed from the AM structure. 

Lemma 7 ([2]). The above algorithm correctly computes in 3 steps all Max(R), 

ReTZ. 

The partition refinement can be efficiently implemented using the data struc- 
ture presented in Section [4] of that in [5] which is a simpler version of the first 
one. 

Theorem 3 (|2j). The identification of all Max(R), R € 1Z, using partition 
refinement can be done in 0(\TZ\) time. 



